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Abstract. Tree automata with one memory have been introduced in 2001. They gener- 
alize both pushdown (word) automata and the tree automata with constraints of equality 
between brothers of Bogaert and Tison. Though it has a decidable emptiness problem, 
the main weakness of this model is its lack of good closure properties. 

We propose a generalization of the visibly pushdown automata of Alur and Madhusu- 
dan to a family of tree recognizers which carry along their (bottom-up) computation an 
auxiliary unbounded memory with a tree structure (instead of a symbol stack). In other 
words, these recognizers, called Visibly Tree Automata with Memory (VTAM) define a 
subclass of tree automata with one memory enjoying Boolean closure properties. We show 
in particular that they can be determinized and the problems like emptiness, member- 
ship, inclusion and universality are decidable for VTAM. Moreover, we propose several 
extensions of VTAM whose transitions may be constrained by different kinds of tests be- 
tween memories and also constraints a la Bogaert and Tison. We show that some of these 
classes of constrained VTAM keep the good closure and decidability properties, and we 
demonstrate their expressiveness with relevant examples of tree languages. 



Introduction 

The control flow of programs with calls to functions can be abstracted as pushdown 
systems. This allows to reduce some program verification problems to problems (e.g. model- 
checking) on pushdown automata. When it comes to functional languages with continuation 
passing style, the stack must contain information on continuations and has the structure of 
a dag (for jumps). Similarly, in the context of asynchronous concurrent programming lan- 
guages, for two concurrent threads the ordering of return is not determined (synchronized) 
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and these threads can not be stacked. In these cases, the control flow is better modeled as 
a tree structure rather than a stack. That is why we are interested in tree automata with 
one memory, which generalize the pushdown (tree) automata, replacing the a stack with a 
tree. Here, a "memory" has to be understood as a storage device, whose structure is a tree. 
For instance, two memories would correspond to two storage devices whose access would be 
independent. 

The tree automata with one memory introduced in [7] compute bottom-up on a tree, 
with an auxiliary memory carrying a tree, as in former works such as |14j. Along a com- 
putation, at any node of the tree, the memory is updated incrementally from the memory 
reached at the sons of the node. This update may consist in building a new tree from the 
memories at the sons (this generalizes a push) or retrieving a subtree of one of the memories 
at the sons (this generalizes a pop). In addition, such automata may perform equality tests: 
a transition may be constrained to be performed, only when the memories reached at some 
of the sons are identical. In this way, tree automata with one memory also generalize certain 
cases of tree automata with equality and disequality tests between brothers 

Automata with one memory have been introduced in the context of the verification of 
security protocols, where the messages exchanged are represented as trees. In the context of 
(functional or concurrent) programs, the creation of a thread, or a callCC, corresponds to a 
push, the termination of a thread or a callcc corresponds to a pop. The emptiness problem 
for such automata is in EXPTIME (note that for the extension with a second memory the 
emptiness problem becomes undecidable) . However, the class of tree languages defined by 
such automata is neither closed by intersection nor by complement. This is not surprising 
as they are strictly more general than context free languages. 

On the other hand, Alur and Madhusudan have introduced the notion of visibility for 
pushdown automata f2], which is a relevant restriction in the context of control flow analysis. 
With this restriction, determinization is possible and actually the class of languages is closed 
under Boolean operations. 

In this paper, we propose the new formalism of Visibly Tree Automata with Memory 
(VTAM). On one hand, it extends visibly pushdown languages to the recognition of trees, 
and with a tree structure instead of a stack, following former approaches |14 t l21 t [T0]. On the 
other hand, VTAM restrict tree automata with one memory, imposing a visibility condition 
on the transitions: each symbol is assigned a given type of action. When reading a symbol, 
the automaton can only perform the assigned type of action: push or pop. 

We first show in Section [2] that VTAM can be determinized, using a proof similar to 
the proof of [2j, and do have the good closure properties. The main difficulty here is to 
understand what is a good notion of visibility for trees, with memories instead of stacks. We 
also show that the problems of membership and emptiness are decidable in deterministic 
polynomial time for VTAM. 

In a second part of the paper (Section [3]), we extended VTAM with constraints. Our 
constraints here are recognizable relations; a transition can be fired only if the memory 
contents of the sons of the current node satisfy such a relation. We give then a general 
theorem, expressing conditions on the relations, which ensure the decidability of emptiness. 
Such conditions are shown to be necessary on one hand, and, on the other hand, we prove 
that they are satisfied by some examples, including syntactic equality and disequality tests 
and structural equality and disequality tests. The case of VTAM with structural equality 
and disequality tests (this class is denoted VTAMp) is particularly interesting, since the 



VISIBLY TREE AUTOMATA WITH MEMORY AND CONSTRAINTS 



3 



determinization and closure properties of Section [2] carry over this generalization, which we 
show in Section I3.4.2I The automata of VTAMp also enjoy a good expressive power, as 
we show in Section [3.71 by presenting some non-trivial examples of languages in this class: 
well-balanced binary trees, red-black trees, powerlists... 

As an intermediate result, we show that, in case of equality tests or structural equality 
tests, the language of memories that can be reached in a given state is always a regular 
language. This is a generalization of the well-known result that the set of stack contents 
in a pushdown automaton is always regular. To prove this, we observe that the memories 
contents are recognized by a two-way alternating tree automaton with constraints. Then we 
show, using a saturation strategy, that two-way alternating tree automata with (structural) 
equality constraints are not more expressive than standard tree automata. 

Finally, in Section [4] we propose a class of visibly tree automata, which combines the 
structural constraints of VTAMp, testing memory contents, with Bogaert-Tison constraints 
of [1] (equality and disequality tests between brothers subterms) which operate on the term 
in input. We show that the tree automata of this class can be determinized, are closed 
under Boolean operations and have a decidable emptiness problem. 

Related Work. Generalizations of pushdown automata to trees (both for input and stack) 
are proposed in [14:\ [2T1 [TO] . Our contributions are the generalization of the visibility 
condition of [2] to such tree automata - our VTAM (without constraints) strictly generalize 
the VP Languages of [2], and the addition of constraints on the stack contents. The visibly 
tree automata of |1] use a word stack which is less general than a tree structured memory 
but the comparison with VTAM is not easy as they are alternating and compute top-down 
on infinite trees. 

Independently, Chabin and Rety have proposed [5] a formalism combining pushdown 
tree automata of |tl4j with the concept of visibly pushdown languages. Their automata 
recognize finite trees using a word stack. They have a decidable emptiness problem and the 
corresponding tree languages (Visibly Pushdown Tree Languages, VPTL) are closed under 
Boolean operations. Following remarks of one of these two authors, it appeared that VTAM 
and VPTL are incomparable, see Section 12.21 

1. Preliminaries 

1.1. Term algebra. A signature S is a finite set of function symbols with arity, denoted 
by /, g. . . We write S„ the subset of function symbols of T, of arity n. Given an infinite 
set X of variables, the set of terms built over S and X is denoted T(S, X), and the subset 
of ground terms is denoted T{T,). The set of variables occurring in a term t E T{T,,X) is 
denoted vars{t). A substitution cr is a mapping from X to T(S, X) such that {3;|(t(x) ^ x}, 
the support of a, is finite. The application of a substitution cj to a term t is written ta. 
It is the homomorphic extension of a to T{T,,X). The positions Pos{t) in a term t are 
sequences of positive integers (A, the empty sequence, is the root position). A subterm of 
t at position p is written and the replacement in t of the subterm at position p hy u 
denoted t[u]p. 
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1.2. Rewriting. We assume standard definitions and notations for term rewriting [TT]. A 
term rewriting system (TRS) over a signature S is a finite set of rewrite rules i ^ r, where 
£ G T(S, X) and r G T (S, vars{i)). A term t£T{T,, X) rewrites to s by a TRS 7^ (denoted 
t -^Ti s) if there is a rewrite rule £ —f r £ TZ, a position p of t and a substitution a such 
that t\p = £a and s = t[ra]p. The transitive and reflexive closure of —>-;?, is denoted 

1.3. Tree Automata. Following definitions and notation of [8], we consider tree automata 
which compute bottom-up (from leaves to root) on (finite) ground terms in T(S). At each 
stage of computation on a tree t, a tree automaton reads the function symbol / at the current 
position p in t and updates its current state, according to / and to the respective states 
reached at the positions immediately under p in t. Formally, a bottom-up tree automaton 
(TA) ^ on a signature S is a tuple {Q, Qf, A) where S is the computation signature, Q is a 
finite set of nullary state symbols, disjoint from T,, Qf Q is the subset of final states and 
A is a set of rewrite rules of the form: /(gi, . . . , Qn) — > q, where / G E and qi, . . . , qn £ Q- 
A term t is accepted (we may also write recognized) by A in state g iff t -^-^ q, and the 
language L{A,q) of A in state q is the set of ground terms accepted in q. The language 
L{A) of A is UqGQf -^(-^5 q) ^ set of ground terms is called regular if it is the language 
of a TA. 



2. Visibly Tree Automata with Memory 

We propose in this section a subclass of the tree automata with one memory [7] which 
is stable under Boolean operations and has decidable emptiness and membership problems. 

2.1. Definition of VTAM. Tree automata have been extended [HI [211 lOl C] to carry an 
unbounded information along the states in computations. In [7j, this information is stored 
in a tree structure and is called memory. We keep this terminology here, and call our 
recognizers tree automata with memory (TAM). For consistency with the above formalisms, 
the memory contents will be ground terms over a memory signature T. 

Like for TA we consider bottom-up computations of TAM in trees; at each stage of 
computation on a tree t, a TAM, like a TA, reads the function symbol at the current 
position p in t and updates its current state, according to the states reached immediately 
under p. Moreover, a configuration of TAM contains not only a state but also a memory, 
which is a tree. The current memory is updated according to the respective contents of 
memories reached in the nodes immediately under p in t. 

As above, we use term rewrite systems in order to define the transitions allowed in 
a TAM. For this purpose, we add an argument to state symbols, which will contain the 
memory. Hence, a configuration of TAM in state q and whose memory content is the 
ground term m G T{T), is represented by the term q{m). We propose below a very general 
definition of TAM. It is similar to the one of [7], except that we have here general patterns 
rui, . . . ,mn,m, while these patterns are restricted in [7j, for instance avoiding memory 
duplications. Since we aim at providing closure and decision properties, we will also impose 
(other) restrictions later on. 

Definition 2.1. A bottom-up tree automaton with memory (TAM) on a signature E is a 
tuple (r, Q, Q<j, A) where F is a memory signature, Q is a finite set of unary state symbols, 
disjoint from S U F, Qf C Q is the subset of final states and A is a set of rewrite rules of the 
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form /(gi(mi), . . . ,gn(m„)) q{m) where f e T.n, qi, ■ ■ ■ ,qn,q & Q and mi, . . . ,mn,m G 

r(r,^). 

The rules of A are also called transition rules. A term t is accepted by A in state q £ Q 
and with memory m € T{T) iff t and the language L{A, q) and memory language 

M{A, q) of ^ in state g are respectively defined by: 

L{A,q) = {t I 3mGT(r), t ^ c?(m)} 

M{A,q) = [m I 3tGT(S), t^q{m)]. 

The language of ^ is the union of languages of A in its final states, denoted: L{A) = 

Visibility Condition. The above formalism is of course far too expressive. As there are 
no restrictions on the operation performed on memory by the rewrite rules, one can easily 
encode a Turing machine as a TAM. We shall now define a decidable restriction called visibly 
tree automata with memory (VTAM). 

First, we consider only three main families (later divided into the subcategories defined 
in Figure[T]) of operations on memory. We assume below a computation step at some position 
p of a term, where memories mi, . . . have been reached at the positions immediately 
below p: 

PUSH: the new current memory m is built with a symbol /i G r„ pushed on the top of 
memories mi, . . . ,mn'- /((/i(mi), . . . , g„(m„)) q{h{mi, . . . ,mn))- According to the 
terminology of [2], this corresponds to a call move in a program represented by an au- 
tomaton. 

POP: the new current memory is a subterm of one of the memories reached so far: 
/(..., qi{h{m[, . . . , m'f,)), . . .) ^ Qifn'j)- The top symbol h of mj is also read. This corre- 
sponds to a function's return in a program. 

We have here to split POP operations into four categories, depending on whether we 
pop on the memory at the left son or on the memory at the right son and on whether we 
get the left son of that memory or its right son. 

INT (internal): the new current memory is one of the memories reached: 

f{qi{mi), qn{mn)) ^ q{mi) 

This corresponds to an internal operation (neither call nor return) in a function of a 
program. 

Again, we need to split INT operations into three categories: one for constant symbols 
and two rules for binary symbols, depending on which of the two sons memories we keep. 

Next, we adhere to the visibility condition of [2]. The idea behind this restriction, 
which was already in [16], is that the symbol read by an automaton (in a term in our case 
and [1], in a word in the case of [2]) corresponds to an instruction of a program, and hence 
belongs to one of the three above families (call, return or internal). Indeed, the effect of 
the execution of a given instruction on the current program state (a stack for [2j or a tree 
in our case) will always be in the same family. In other words, in this context, the family of 
the memory operations performed by a transition is completely determined by the function 
symbol read. 

Let us assume from now on for the sake of simplicity the following restriction on the 
arity of symbols: 



6 



H. COMON-LUNDH, F. JACQUEMARD, AND N. PERRIN 



PUSH 


a 








- 9(c) 


a G SpuSH 


PUSH 




'iiiyi), 


92(2/2)) 




9(^^(2/1,2/2)) 


/ £ SpuSH 


POPii 


/' 


(QiiHyiiiyu)), 


92(^2) 




9(2/11) 


/ G SpOPii 






92(^2)) 
q2{y2)] 
92(2/2)) 




- 9(^) 




POP12 


fiqiiKyuiyu)), 


1 
1 


9(2/12) 
- 9(^) 


/ G 5]poPi2 


POP21 


/I 


[qiiyi), 


q2{h{y: 


21,2/22))) - 


9(2/21) 


/ e 5]pOP2i 






^qiiyi), 


92(±)) 


- 9(^) 




POP22 


/(ft(yi), 


q2{h{y: 


21,2/22))) - 


9(2/22) 


/ e 5]pOP22 




f{qi{yi), 


92(±)) 


- 9(^) 




INTo 


a 








- 9(^) 




INTi 


/(ft(yi), 


92(2/2)1 


1 


9(2/1) 


/ G 5]||\|Ti 


INT2 


f{qi{yi), 


92(2/2)) 




9(2/2) 


/ G i;|NT2 


where qi,q2,q G Q, Vi, 


y2 are distinct variables of A", c € r2, 


/i G r2. 



Figure 1: VTAM transition categories. 



All the symbols of S and T have either arity or 2. 
This is not a real restriction, and the results of this paper can be extended straightforwardly 
to the case of function symbols with other arities. The signature S is partitioned in eight 
subsets: 

S = SpuSH W SpoPii 1+) SpoPia SpOP2i ^ SP0P22 ^ SiNTo ^ ^inTi W S1NT2 
The eight corresponding categories of transitions (transitions of the same category perform 
the same kind of operation on the memory) are defined formally in Figure [TJ In this figure, 
one constant symbol has a particular role: 

_L is a special constant symbol in F, used to represent an empty memory. 

Note that there are three categories for INT, INTq is for constant symbols and INTi, INT2 
are for binary symbols and differ according to the memory which is kept. Similarly, there 
are four variants of POP transitions, POPn, . . . , POP22- Moreover, each POP rule has a 
variant, which reads an empty memory [i.e. the symbol _L). 

Definition 2.2. A visibly tree automaton with memory (or VTAM for short) on S is a 
TAM (F, Q, Qf , A) such that every rule of A belongs to one of the above categories PUSH, 
POPii, POP12, POP21, POP22, INTo, INTi, INT2. 

2.2. Expressiveness, Comparison. Standard bottom-up tree automata are particular 
cases of VTAM (simply assume all the symbols of the signature in INTq or INTi). 

Now, let us try to explain more precisely the relation with the visibly pushdown lan- 
guages of [2], when considering finite word languages. 

If the stack is empty in any accepting configuration of some finite word pushdown 
automaton A, then it is easy to compute a pushdown automaton A, which accepts the 
reverses (mirror images) of the words accepted by A. Moreover, if ^ is a visibly pushdown 
automaton, then A is also a visibly pushdown automaton: it suffices to exchange the push 
and pop symbols. 
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For pushdown word languages, there is a well-known lemma showing that the recogni- 
tion by final state is equivalent to the recognition by empty stack. This equivalence however 
requires e-transitions to empty the stack when a final state is reached. There are however 
no e-transitions in visibly pushdown automata. So, if we consider for instance the language 
of words w G {a, b}* such that any prefix of w contains more o than 6's, it is recognized by 
a visibly pushdown automaton. While, if we consider the mirror image (all suffixes contain 
more a's than 6's), it is not recognized by a visibly pushdown automaton. 

In conclusion, as long as visibility is relevant, the way the automaton is moving is also 
relevant. This applies of course to trees as well: there is a difference between top-down and 
bottom-up recognition. 

Now, if we encode a word as a tree on a unary alphabet, starting from right to left, 
VTAM generalize visibly pushdown automata: moving bottom-up in the tree corresponds 
to moving left-right in the word. 

VPTA transitions and VPTL are defined in [5] in the same formalism (rewrite rules) as 
in Figure [H except that the rules are oriented in the other direction (top-down computa- 
tions) and the memory contains a word, i.e. terms built with unary function symbols and 
one constant (empty stack). 

As sketched above, since the automata of [5j work top-down, a language can be rec- 
ognized by a VTAM (which works bottom-up) and not by a VPTL. As a typical example, 
consider the trees containing only unary symbols a, b and a constant and such that all 
subterms contain more a's than 6's. 

But the converse is also true: there are similarly languages that are recognized by 
VPTA and not by VTAM (and there, constraints cannot help!) 

Now, if we consider a slight modification of VPTA, in which the automata work bottom- 
up (simply change the direction of transition rules), it is not clear that good properties 
(closure and decision) are preserved since, now, we get equality tests between memory 
contents, increasing the original expressive power; when going top-down we always duplicate 
the memory content and send one copy to each son, while going bottom-up we may have 
different memory contents at two brother positions. 

2.3. Determinism. A VTAM A is said complete if every term of T(S) belongs to L{A, q) 
for at least one state q & Q. Every VTAM can be completed (with a polynomial overhead) 
by the addition of a trash state. Hence, we shall consider from now on only complete 
VTAM. 

A VTAM A= {T,Q,Qf, A) is said deterministic iff: 

• for all a £ Sinj^ there is at most one rule in A with left-member a, 

• for all / S SpusH U SinTi U S|nt2) for all gi,(?2 G Q, there is at most one rule in A with 
left-member /(gi(yi), ^2(2/2)) , 

• for all / E SpoPii USpop^j (respectively SpoPai USpoP22)i for all Qi,Q2 ^ Q and all /i G F, 
there is at most one rule in A with left-member /((?i(/i(|/ii, 2/12))) 92(2/2)) (respectively 
f{qi{yi),q2{h{y2i,y22))))- 

Theorem 2.3. For every VTAM A = (F,Q,Qf,A) there exists a deterministic VTAM 
j^det ^ (T'^et^Qdet^Qdet^^det-^ ^^^f^ ^f^^^ ^ L{A'^'^), whcre \Q'^'^\ and |r^^*| both are 
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Proof. We follow the technique of [2] for the determinization of visibly pushdown automata: 
we do a subset construction and postpone the application (to the memory) of PUSH rules, 
until a matching POP is met. The construction of [2] is extended in order to handle the 
branching structure of the term read and of the memory. 

With the visibility condition, for each symbol read, only one kind of memory operation 
is possible. This permits a uniform construction of the rules of A'^'^^ for each symbol of S. 
As we shall see below, A'^''^ does not need to keep track of the contents of memory (of A) 
during its computation, it only needs to memorize information on the reachability of states 
of A, following the path (in the term read) from the position of the PUSH symbol which has 
pushed the top symbol of the current memory (let us call it the last-memory-push-position) 
to the current position in the term. We let : 

Q'^'^* := {0,1} X V{Q) X V{Q^) 

Qf'^^ is the subset of states whose second component contains a final state of Qf . The first 
component is a flag indicating whether the memory is currently empty (value 0) or not 
(value 1). The second component is the subset of states of Q that A can reach at current 
position, and the third component is a binary relation on Q which contains (g, q') iff starting 
from a state q and memory m at the last-memory-push-position, A can reach the current 
position in state q\ and with the same memory m. We consider memory symbols made of 
pairs of states and PUSH symbols: 

T'^' := {Q'^-'f X (Spush) 

The components of a symbol p G F'^'^* refer to the transition who pushed p: the first and 
second components of p are respectively the left and right initial states of the transition 
and the third component is the symbol read. 

The transition rules of A*^*^* are given below, according to the symbol read. 

INT. For every i and for every / G SiMjj, we have the following rules in A*^^*: 

/((5i, 5i)(yi), (52, i?2, S2){y2)) ^ i?, S){yi) 

where R:= {q \ 3gi G Ri,q2 G -^2, /(gi(2/i), 92(2/2)) —> q{yi) G A}, and S is the update of 
Si according to the INTi-transitions of A, when 61 = 1 (the case 61 = is similar): 

S ■■= {{q,q') I 3gi G (5,92 G R2, {q,qi) e 5*1 and f{qi{yi),q2{V2)) ^ q'ivi) G A}. 
The case / G S|i\it2 is similar. 

PUSH. For every / G SpusH, we have the following rules in A*^^*: 

/((6i,fli,5i)(yi),(62,i?2,52)(y2)) ^ (l,i?,MQ)(p(yi,y2)) 

where R ■= [q \ 3qi G Ri,q2 e R2,h e F, /(gi(yi), §2(^2)) ^ 9(^1(^1, ^2)) G A}, Hq := 
{(95 9) I 9 G Q} is used to initialize the memorization of state reachability from the position 
of the symbol /, and p := ((61, {b2, R2, S2), f)- Note that the two states reached 

just below the position of application of this rule are pushed on the top of the memory. 
They will be used later in order to update R and S when a matching POP symbol is read. 
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R 



POP. For every / € SpoPn, we have the following rules in A*^*^*: 

f{{bi,Ri, Si){H{yn,yi2)), (&2, i?2, S2){y2)) ^ {b, R, S){yn) 
where H = {Qi,Q2,g), with Qi = {b[,R[,S[) G Q^'\ Q2 = {b'^, R'2. S'^) G Q'^'K 
b = b[ 

3q[ G R[,q'2 G R'2,{qo,qi) G Si,q2 (£R2,h& T, ^^(ya)) ^ 

?o(^(2/i,y2)) e A,f{qi{h{yu,yi2)),q2{y2)) ^ q{y 11) G A 
3q[ G S[{q),q'^ G R'^,{qo,qi) G 5i,g2 £R2,h(£ T, ^(^^(yi), ^^(ys)) 
qo{h{yi,y2)) G A, /(gi(/i(yii,yi2)), 92(2/2)) ^ g'(yii) G A 

When a POP symbol is read, the top symbol of the memory, which is popped, contains the 
states reached just before the application of the matching PUSH. We use this information 
in order to update and (625-^21 5*2) to {b,R,S). 

The cases / G SpoPjj) / S SPOP21) / ^ 5]poP22 are similar. 

The above constructions ensure the three invariants stated above, after the definition 
of Q'^'^* and corresponding to the three components of these states. It follows that L{A) = 



{q, 



2.4. Closure Properties. The tree automata with one memory of [7| are closed under 
union but not closed under intersection and complement (even their version without con- 
straints). The visibility condition makes possible these closures for VTAM. 

Theorem 2.4. The class of tree languages of VTAM is closed under Boolean operations. 
One can construct VTAM for union, intersection and complement of given VTAM languages 
whose sizes are respectively linear, quadratic and exponential in the size of the initial VTAM. 

Proof Let Ai = (Ti, Qi, Qf^i, Ai) and A2 = (r2, Qa, (3f,2, A2) be two VTAM on S. We 
assume wlog that Qi and Q2 are disjoint. 

For the union of the languages of Ai and A2, we construct a VTAM Au whose memory 
signature, state set, final state set and rules set are the union of the respective memory 
signatures, state sets, final state sets and rules sets of the two given VTAM. We have 
L{Au)=L{Ai)UL{A2). 

Au = (Fi u r2, Qi u Q2, Qf,i u Qf,2, Ai u A2) 

For the intersection of the languages of Ai and A2, we construct a VTAM whose 
memory signature, state set and final state set are the Cartesian product of the respective 
memory signatures, state sets and final state sets of the two given VTAM. 

= (Fi X T2,Qi X Q2,Qf,i X (5f,2,An) 

The rule set Ap of the intersection VTAM An is obtained by "product" of rules of the two 
given VTAM with same function symbols. The product of rules means Cartesian products 
of the respective states and memory symbols pushed or popped. More precisely, Ap is the 
smallest set of rules such that: 

• if Ai contains /(Q'ii(yi), ^12(2/2)) ^ qi{hi{yi,y2)) and A2 contains /(g'2i(yi), ^22(2/2)) ^ 
q2{h2{yi,y2)), for some / G SpusH, then An contains /((^ii, g2i)(yi), (^12, g22)(y2)) ^ 
{qi,q2){{hi,h2){yi,y2))- 
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• if Ai contains /(gii(/ii(yii, 2/12)), 912(1/2)) ^ Qiiuii) and A2 contains 
/(92i(/i2(yii,yi2)),g22(?/2)) q2{yu) for some / € SpoPii, then An contains 

/((9ii,g2,i)((/ii,/i2)(yii,yi2)), (gi2,g2,2)(y2)) {qi,q2){yn) 

• similarly for POP12, POP21 and POP22 

• if Ai contains /(gii(yi), 921(2/2)) ^ 91(2/1) and A2 contains /(g2i(yi), 922(2/2)) ^ 92(2/1) 
for some / G S|nTi, then An contains /((911, 92,i)(2/i), (912, 92,2) (2/2)) ^ (9i,92)(2/i) 

• and similarly for INT2, INTq. 

We have then -L(^n) = L^Ai) riL{A2)- Note that the above product construction for ^n is 
possible only because the visibility condition ensures that two rules with the same function 
symbol in left-side will have the same form. Hence we can synchronize memory operations 
on the same symbols. 

For the complement, we use the construction of Theorem 12.31 and a completion (this 
operation preserves determinism), and take the complement of the final state set of the 
VTAM obtained. □ 



2.5. Decision Problems. Every VTAM is a particular case of tree automaton with one 
memory of [7]. Since the emptiness problem (whether the language accepted is empty or 
not) is decidable for this latter class, it is also decidable for VTAM. However, whereas this 
problem is EXPTIME-complete for the automata of [7], it is only PTIME for VTAM. 

Theorem 2.5. The emptiness problem is PTIME- complete for VTAM. 

Proof. Assume given a VTAM A = {T,Q,Qf, A). By definition, for each state q € Q, the 
language L{A, 9) is empty iff the memory language M{A, 9) is empty. For each state 9, we 
introduce a predicate symbol Pq and we construct Horn clauses in such a way that Pq{m) 
belongs to the least Herbrand model of this set of clauses, iff the configuration with state 9 
and memory m is reachable by the automaton (i.e. m G M{A,q)). 

For such a construction (already given in [7]), we simply forget the function symbol, as- 
sociating to a transition rule / (91(772-1 ), g2('7i2)) — > q{m) the Horn clause (mi), ("^2) =^ 
Pq{m). Then, according to the restrictions in Definition 12. 2 ^ we get only Horn clauses of 
one of the following forms: 

^91 (2/1), ^92 (y2) ^ Pq{h{yi,y2)) 
Pqiih{yn,y 12)), Pq2{y2) => Pq{yn) 

-Pgi (^(2/11, 2/12)), -P52 (2/2) Pqiyu) 
Pq,{±),Pq,{y2) Pq{±) 
Pq,{yi),Pq,{y2) Pq{yi) 

where all the variables are distinct. Such clauses belong to the class Ti^, of [19], for which 
it is proved in [19] that emptiness is decidable in cubic time. It follows that emptiness of 
VTAM is decidable in cubic time. 

Hardness for PTIME follows from the PTIME-hardness of emptiness of finite tree au- 
tomata [8j. □ 
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Another proof relying on similar techniques, but for a more general result, will be stated 
in Lemma 13.71 and can be found in Appendix [5j 

The universality is the problem of deciding whether a given automaton recognizes 
all ground terms. Inclusion refers to the problem of deciding the inclusion between the 
respective languages of two given automata. 

Corollary 2.6. The universality and inclusion problem are EXPTIME-complete for VTAM. 

Proof. A VTAM A is universal iff the language of its complement automaton A is empty, 
and L{Ai) C L(^2) iff L{Ai) n L(A^) = 0. With the bounds given in Theorem [23] these 
problems can be decided in EXPTIME for VTAM (these operations require a determiniza- 
tion of a given VTAM first). 

The EXPTIME-hardness follows from the corresponding property of finite tree au- 
tomata (see [8] for instance). □ 

The membership problem is, given a term t and an automaton A, to know whether t is 
accepted by A. 

Corollary 2.7. The membership problem is decidable in P TIME for VTAM. 

Proof. Given a term t we can build a VTAM At which recognizes exactly the language 
{t}. The intersection of At with the given VTAM A recognizes a non empty language iff t 
belongs to the language of A. □ 



3. Visibly Tree Automata with Memory and Constraints 

In the late eighties, some models of tree recognizers were obtained by adding equality 
and disequality constraints in transitions of tree automata. They have been proposed in 
order to solve problems with term rewrite systems or constraints systems with non-linear 
patterns (terms with multiple occurrences of the same variable). The tree automata of [3] 
for instance can perform equality and disequality tests between subterms located at brother 
positions of the input term. 

In the case of tree automata with memory, constraints are applied to the memory 
contents. Indeed, each bottom-up computation step starts with two states and two memories 
(and ends with one state and one memory), and therefore, it is possible to compare the 
contents of these two memories, with respect to some binary relation. 

We state first the general definition of visibly tree automata with constraints on mem- 
ories (Section 13. ip . then give sufficient conditions on the binary relation for the emptiness 
decidability (Section 13. 2p and show that, if in general regular binary relations do not sat- 
isfy these conditions (and indeed, the corresponding class of constrained VTAM has an 
undecidable emptiness problem. Section 13. 3p some relevant examples do satisfy them. In 
particular, we study in Section [3A2] the case of VTAM with structural equality constraints. 
They enjoy not only decision properties but also good closure properties. Some relevant 
examples of tree languages recognized by constrained VTAM of this class are presented at 
the end of the section. 
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Figure 2: New transition categories for VTAMf^. 

3.1. Definitions. Assume given a fixed equivalence relation R on T{T). We consider now 
two new categories for the symbols of S: INTf and INTf, in addition to the eight previous 
categories of page [H The new categories correspond to the constrained versions of the 
transition rules INTi and INT2 presented in Figure [21 The constraint yi Ry2 in the two first 
rules of Figured] is called positive and the constraint yi ^Ry2 in the two last rules is called 
negative. 

We shall not extend the rules PUSH and POP with constraints for some rea- 
sons explained in section 13.51 A ground term t rewrites to s by a constrained rule 
/ (91 (2/1), 92(2/2)) > r (where c is either R or -li?) if there exists a position p of t 

and a substitution a such that t\p = ia, yicrcy2a and s = t[ra]p. 

For example, if R is term equality, the transition is performed only when the memory 
contents are identical. 

Definition 3.1. A visibly tree automaton with memory and constraints (VTAMfjij) on a 
signature S is a tuple (T, i?, Q, Qf , A) where P, Q, Qf are defined as for TAM, R is an 
equivalence relation on T(r) and A is a set of rewrite rules in one of the above categories: 
PUSH, POPii, POP12, POP21, POP22, INTo, INTi, INT2, INTf, INTf. 

We let VTAM'^ be the subclass of VTAMf^ with positive constraints only. The accep- 
tance of terms of T(S) and languages of term and memories are defined and denoted as in 
Section 12.11 

The definition of complete VTAMf^ is the same as for VTAM. As for VTAM, every 
VTAMf^ can be completed (with a polynomial overhead) by the addition of a trash state 
q±. The only subtle difference concerns the constrained rules: for every /g € INTf and 
every states qi,q2, 

• if there is a rule 79(91(2/1), 92(2/2)) > q{yi) and no rule of the form 
/9(9i(2/i),92(2/2)) ^^^^ 9'(2/i), then we add /g (ft (2/1), 92(2/2)) ^^^^ 9±(2/i), 

• if there is a rule 79(91(2/1), 92(2/2)) ^^v^ > q{yi) and no rule of the form 
79(91(2/0,92(2/2)) 9'(2/i), then we add 79(91(2/^,92(2/2)) 9±(2/i), 

• if there is no rule of the form 7g(gi(yi), 52 (2/2)) ^'^^^> 9(2/1) or 79(91(2/1), 92(2/2)) ^^"^^^ > 
q'{y,), then we add 79(91(2/^,92(2/2)) 9±(2/i) and 79(91(2/^,92(2/2)) ^^^^ 
9±(2/i)- 

The definition of deterministic VTAMf is based on the same conditions as for VTAM 
for the function symbols in categories of PUSHq, PUSH, POPn, . . . , POP22, INTi, INT2. For 
the function symbols of INTf, INTf, we have the following condition: for all f € S||.^-pfl U 

for all qi, q2 G Q, there are at most two rules in A with left-member 7(91(2/1), 92(2/2)), 
and if there are two, one has a positive constraint and the other has a negative constraint. 
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We will see in Section 13.41 a subclass of VTAM_,^ that can be determinized (when R is 
structural equality) and another one that cannot (when R is syntactic equality). 

3.2. Sufficient Conditions for Emptiness Decision. We propose here a generic theo- 
rem ensuring emptiness decision for VTAM:^^^. The idea of this theorem is that under some 
condition on i?, the transition rules with negative constraints can be eliminated. 

Theorem 3.2. Let R he an equivalence relation satisfying these two properties: 

i. for every automaton A of VTAM^ and for every state q of A, the memory language 
M{A, q) is effectively a regular tree language, 

ii. for every term m G '^'(r), the cardinality of the equivalence class of m for R is finite 
and and its elements can be enumerated. 

Then the emptiness problem is decidable for VTAM:^^^. 

Proof. The proof relies on the following Lemma 13.31 which states that the negative con- 
straints in VTAM^^ can be eliminated, while preserving the memory languages. The elim- 
ination can be done thanks to the condition ii , by replacement of the rules of INT^^ and 
INT2^ by rules of INTf and INTf . 

Next, we can use i in order to decide emptiness for the VTAM^ obtained by elimination 
of negative constraints. Indeed, for all states q of A, by definition, L{A, q) is empty iff 
M{A, q) is empty. □ 

Lemma 3.3. Let R satisfy the hypotheses i and ii of Theorem \3.SX and let A = 
{T,R,Q,Qf,A) be a VTAM^^. There exists a VTAM^ A+ = (F, i?, Q+, Qf , A+) such 
that Q C Q+, and for each q eQ, M{A'^,q) = M{A,q). 

Proof. The construction of is by induction on the number n of rules with negative 
constraints in A and uses the bound on the size of equivalence classes, condition ii of the 
theorem. 

The result is immediate if n = 0. 

We assume that the result is true for n — 1 rules, and show that we can get rid of a rule 
of A with negative constraints (and replace it with rules unconstrained or with positive 
constraints). Let us consider one such rule: 

/(gi (yi), 92(^2)) ^^^^g(yi) (3.1) 

We show that, under the induction hypothesis, we have the following lemma which will 
be used below in order to get rid of the rule (13. ip . 

Lemma 3.4. Given mi, . . . ,mk € M{A,q2), it is effectively decidable whether M{A,q2) \ 
{mi, . . . ,mfc} is empty or not and, in case it is not empty, we can effectively build a rrik+i 
in this set. 

Proof. Let [m^jfl denote the equivalence class of m^. By condition ii, every [mjjfi is finite, 
hence for each i < k, we can build a VTAM Ai with a state pi such that M{Ai,pi) is the 
complement of [m^Ji^. We add all the rules of Ai to A, obtaining A' (we assume that the 
state sets of Ai, . . . , Ak,A are disjoint, and that the states of ^i, . . . , Ak are not final in 
A'). 

Since R is an equivalence relation, we have: 

yi ^Rmi iff yi ^ [m^]/? iff 3y2 i [mi]R, yiRy2 
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Hence, if y2 = rrij is a witness for the rule (j3.ip . then we can apply instead a rule: 

f[qi{yi)My2))^^^^q{yi) (3.2) 

Then we add to A' the rules (j3.2j) as above and obtain A" . It can be shown that M{A" , 92) = 

Let mfc+i be a term of M(^", ^2) \ {n-i, • • • , w-^} of minimal size (if one exists). This 
term nik+i can be created in a run of A" which does not use the rule (13. ip . Otherwise, the 
witness for 7/2 in the application of this rule would be a term of M{A" , 52) \ {"ii, ■ ■ ■ , ^k} 
smaller than nik+i (it cannot be one of {mi, . . . ,mfc} because for these particular values 
of y2, we assume the application of (|3.2p ). It follows that ruk+i € M{A" \ (|3.ip . q?). This 
automaton ^1 = A" \ (|3.ip has n — 1 rules with negative constraints. Hence, by induction 
hypothesis, there is a VTAM^ Af with mfc_|_i in its memory language M{Af,q2)- By 
condition i, this language is regular and we can build 77ifc+i from a TA for this language. □ 

Now, let us come back to the proof that we can replace rule (j3.ip . while preserving the 
memory languages. 

If M{A, 52) = (which can be effectively decided according to lemma [3^ then the rule (|3.ip 
is useless and can be removed from A without changing its memory language. Note that 
the condition M{A, ^2) = is decidable because by hypothesis i, M{A, ^2) is regular. 
Otherwise, let mi E M{A,q2) be built with Lemma 13.41 and let A^i be the cardinal of the 
equivalence class [mi]/?. We apply A^i times the construction of Lemma 13.41 There are 
three cases: 

(1) if we find more than A^i terms in M{A, q2), then one of them, say m^ is not in [mij^. 
Then (j3.ip is useless for the point of view of memory languages: whatever value for yi, 
we know a y2 € M{A, ^2) which permits to fire the rule. Indeed, if yi G then we 
can choose 7/2 = m^, and otherwise we choose 2/2 = mi. Hence ()3.ip can be replaced 
without changing the memory language by: 

/(9i(yi), 90(^2)) -> g(yi) (3.3) 

where qo is any state of A such that M{A, qo) 7^ 0- We can then apply the induction 
hypothesis to the VTAM^^^ obtained. 

(2) if we find less than A^i terms in M{A, q2), but one is not in [mij/j. The case is the same 
as above. 

(3) if we find less than A'"i terms in M{A, q2), all in [mi]/j, it means that one of the appli- 
cations of Lemma 13.41 was not successful, and hence that we have found all the terms 
of M{A,q2)- It follows that the rule ()3.ip can be fired iff yi ^ [mijji, i.e. there exists 
2/2 ^ [itt-iIr such that yiRy2- Hence, we can replace (13. Ij) by 

/(gi(yi),Pi(y2)) ^^^g(2/i). 

Then we can apply the induction hypothesis. 

□ 

We present in Section 13.41 two examples of relations satisfying i. and ii. 
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3.3. Regular Tree Relations. We first consider the general case of MTAM^ji wliere tfie 
equivalence R is based on an arbitrary regular binary relation on T{T). By regular binary 
relation, we mean a set of pairs of ground terms accepted by a tree automaton computing 
simultaneously in both terms of the pair. More formally, we use a coding of a pair of terms 
of into a term of U {_L})^), where _L is a new constant symbol (not in S). This 

coding is defined recursively by: 
. : T(S) U {±} X T(S) U {±} ^ T((S U {±})^) 

• for all a, 6 € So U {-L}, a<S) b := {a, b), 

• for aha G SoU±, / G S2, ti,t2 G /(ti,t2)«)a := (/, a)(ti®±, ta®^) a0f{ti,t2) := 
(a,/)(±(g)ti,±(g)t2), 

• for ah f,g € S2, si,S2,ti,t2 G ^(S), /(si,S2) ® g{ti,t2) := {f,g){si ti, S2 <X) i2)- 
Then, a binary relation i? C T($]) x T(S) is called regular iff the set {s <8> t | (s, t) G ii} is 
regular. The above coding of pairs is unrelated to the product used in Theorem 12. 4i 

Theorem 3.5. The membership problem for VTAM^^ is NP-complete when R is a regular 
binary relation. 

Proof. Assume given a ground term t G T(S) and a VTAM^^ A = (r, R, Q, Qf, A). Because 
of the visibly condition, for every subterm s of t, we can compute in polynomial time in the 
size of s the shape denoted struct{s), which is an abstraction of the memory reached when 
A runs on s. More precisely, struct(s) is an unlabeled tree, and every possible content of 
memory m reachable by ^ in a computation s -^-^ q{m) is obtained by a labeling of the 
nodes of struct(s) with symbols of T. Note that for all subterm s, the size of struct{s) is 
smaller than the size of t. 

Let us guess a decoration of every node of t with a state of Q and a labeling of struct (s) 
(where s is the subterm of t at the given node), such that the root of t is decorated with 
a final state of Qf. We can check in polynomial time whether this decoration represents a 
run of ^ on t or not. 

The NP-hardness is a consequence of Theorem 13.91 which applies to the particular case 
where R is the syntactic equality between terms. □ 

Note that the NP algorithm works with every equivalence R based on a regular relation, 
but the the NP-hardness concerns only some cases of such relations. For instance, in 
Section 13.41 we will see one example of relation for which membership is NP-hard and 
another example for which it is in PTIME. 

The class of VTAM^^ when R is a binary regular tree relation constitutes a nice and 
uniform framework. Note however the condition ii of Theorem 13.21 is not always true in 
this case. Actually, this class is too expressive. 

Theorem 3.6. Given a regular binary relation R and an automaton A in VTAM^, the 
emptiness of L{A) is undecidable. 

Proof. We reduce the blank accepting problem for a deterministic Turing machine A4. We 
encode configurations of Ai as "right-combs" (binary trees) built with the tape and state 
symbols of M, in SpusH (hence binary) and a constant symbol e in Simjq- Let R be the 
regular relation which accepts all the pairs of configurations c®c' such that c' is a successor 
of c by A^. A sequence of configurations cqCi ...c„ (with n > 1) is encoded as a tree 
t = /(co(/(ci, . . . f{cn-i,Cn))), where / is a binary symbol of ^^^jr. 
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We construct a VTAM A which accepts exactly the term-representations t of com- 
putation sequences of Ai starting with the initial configuration cq of M and ending with 
a final configuration c„ with blank tape. Following the type of the function symbols, the 
rules of A will 

• push all the symbols read in subterms of t corresponding to configurations, 

• compare, with R, Ci and Q+i (the memory contents in respectively the left and right 
branches) and store q in the memory, with a transition applied at the top of a subterm 

/(Q,/(Ci+l, . . .))• 

This way, A checks that successive configurations in t correspond to transitions of Ai, 
hence that the language of A is not empty iff ^A accepts the initial configuration cq . □ 



3.4. Syntactic and Structural Equality and Disequality Constraints. We present 
now two examples of relations satisfying the conditions of Theorem 13.21 syntactic and 
structural term equality. The satisfaction of condition i will be proved with the help of the 
following crux Lemma. 

Lemma 3.7. Let R be a regular binary relation defined by a TA whose state set is | 
i = {l..n}} and such that \li,j 3k, I, Vx, y, z. xRiy A yRjz <^ xR^y A xRiz. 
Let A = (r, R, Q, Qf, A) be a tree automaton with memory and constraints (not necessarily 
visibly). Then it is possible to compute in exponential time a finite tree automaton A', such 
that, for every state q & Q, the language M{A, q) is the language accepted in some state 
of A'. 

Proof. (Sketch) To prove this lemma, we first observe that the M{A, q) (for q (z Q) are 
actually the least sets that satisfies the following conditions (we assume here for simplicity 
that the non-constant symbols are binary and display only some of the implications; the 
others can be easily guessed): 

Vx, y,z. xe M{A, qi),y G M{A, ga)) 9{x, y) E M{A, q) 
if there is a rule f {qi{xi) , q2{x2)) q{g{xi,X2)) 
g{x, y) G M{A, qi),ze M{A, 92) ^ x G M{A, q) 

if there is a rule f{qi{g{x.,y),q2{z)) q{x) 

X G M{A, qi),y G M{A, q2),R{x, y) ^ x £ M{A, q) 
if there is a rule / (51 (x), 92(2/)) > q{x) 



In terms of automata, this means that M{A, q) is a language recognized by a two-way 
alternating tree automaton with regular binary constraints. In other words, such languages 
are the least Herbrand model of a set of clauses of the form 

Qi{yi),Q2{y2),R{yi,y2) QM INTi,INT2 
Qi(2/i),Q2(y2) ^ Q3(/(yi,y2)) PUSH 

^ Qi{a) INTo 

Qi(/(2/i,y2)),Q2(y3) ^ ^3(^1) P0Pii,P0P2i 
Qi(/(yi,y2)),Q2(y3) ^ QM POPi2,POP22 

The lemma then shows that languages that are recognized by two-way alternating 
tree automata with some particular regular constraints, are also recognized by a finite 
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tree automaton. This corresponds to classical reductions of two-way automata to one-way 
automata (see e.g [8], chapter 7, [13], or [121 [6] for the first relevant references). The idea 
of the reduction is to find shortcuts: moving up and down yields a move at the same 
level. Add such shortcuts as new rules, until getting a "complete set". Then only keep 
the non-redundant rules: this yields a finite tree automaton. Such a procedure relies on 
the definitions of ordered strategies, redundancy and saturation (aka complete sets), which 
are classical notions in automated first-order theorem proving \13\ [3l |20]. Indeed, formally, 
a "shortcut" must be a formula, which allows for smaller proofs than the proof using the 
two original rules. A saturated set corresponds to a set of formulas whose all shortcuts are 
already in the set. 

The advantage of the clausal formalism is to enable an easy representation of the above 
shortcuts, as intermediary steps. Such shortcuts are clauses, but are not automata rules. 
Second, we may rely on completeness results for Horn clauses. 

That is why, only for the proof of this lemma, which follows and extend the classical 
proofs adding some regular constraints, we switch to a first-order logic formalization. The 
complete proof can be found in Appendix [5l As in the classical proofs, we saturate the 
set of clauses by resolution with selection and eager splitting. This saturation terminates, 
and the set of clauses corresponding to finite tree automata transitions in the saturated set 
recognizes the language M{A,q), which is therefore regular. □ 

The condition on R in the lemma allows to break chains such as 3xi, . . . , Xn-xRxi A 
xiRx2 A ■ ■ ■ A XfiRy A P{x, y), which would be a source of non-termination in the saturation 
procedure. We may indeed replace such chains by 3xi, . . . , Xn-xRiXi AXR2X2 /\ ■ ■ .AxRnXn/\ 
xRoy AP{x,y), which can again be simplified into 3xi.xSxi AxRoy AP{x,y) where S is the 
intersection of i?i , . . . , Rn ■ Possible such intersections range in a finite set as the relation R 
is regular and the RiS are states of the automaton accepting R. 

Finally note that finding k, I in the lemma's assumption can always be performed in an 
effective way since R is regular. 

3.4.1. Syntactic Constraints. We first apply Lemma 13.71 to the class VTAMp where = de- 
notes the equality between ground terms made of memory symbols. Note that it is a 
particular case of constrained VTAM^^ of the above section 13.31 since the term equality is 
a regular relation. The automata of the subclass with positive constraints only, VTAM^, 
are particular cases of tree automata with one memory of [7] , and have therefore a decidable 
emptiness problem. We show below that VTAMp fulfills the hypotheses of Theorem 13.21 
and hence that the emptiness is also decidable for the whole class. 

We can first verify that the relation = checks the hypothesis of Lemma 13.71 hence the 
condition i of Theorem 13. 2i Moreover, the relation = obviously also checks the condition ii 
of Theorem 13. 2i 

Corollary 3.8. The emptiness problem is decidable for YTAM^. 

A careful analysis of the proof of Theorem 13.21 permits to conclude to an EXPTIME 
complexity for this problem with VTAMp. 

Theorem 3.9. The membership problem is NP-complete for VTAMp. 

Proof. An NP algorithm is given in the proof of Theorem 13.51 For the NP-hardness, we use 
a logspace reduction of 3-SAT. Let us consider an instance of 3-SAT with n propositional 
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variables Xi, . . . , Xn and a conjunction of m clauses: 

m 

/\(ai,i V aj,2 V Oi^s) 

i=l 

where every Oij is either a variable X/. (k < n) or a negation of variable ^Xf.. We assume 
wlog that every variable occurs at most once in a clause. 

We consider an encoding t of the given instance as a term over the signature S containing 
the symbols: Xi,...,Xn (constants), id, false, (unary) and A and V (binary). The 
encoding is: 

t '■= C/\ [C\/[6i^i{Xi), . . . , Si^n{Xn)], ■ ■ ■ , C\/[6m,l{Xi), . . . , 6m,n{Xn)]\ 

where Ca (resp. Cy) is a context built solely with A (resp. V) and where every 5ij is either: 

• ^i.j = id (interpreted as the identity) if one of Oi^i, ai^2, cti,3 is Xj, 

• 5ij = -. if one of ai,i, aj,2, ai,3 is ^Xj, 

• ^i.j = false (interpreted as the constant function returning false) if Xj does not occur in 
ai,i, aj,2, ai,3- 

Now, let us partition the signature S with: Xi,... ,X„,V € PUSH, id, false, ^ £ INTi 
and A G INT^; and let consider the memory signature T = {0, 1, V}. We construct now a 
VTAM^ A = (r, =, {qo, qi}, {qi}, A) whose transition will, intuitively: 

• guess an assignment for each constant symbol Xf. of t, by mean of a non-deterministic 
choice of one state qo or qi, 

• compute the value of t with these assignments, 

• push each tuple of assignment for each clause, in the contexts Cv, 

• check the coherence of assignments by means of equality tests between the tuples pushed, 
in the context C^. 

More formally, we have the following transitions in A: 



Xi ^ qo{0) 

Xi qi{l) i<n 

id{qe{yi)) qeivi) 
false{qe{yi)) go (2/1) 

-^{qeiyi)) qi-e{yi) with {0,1} 

v(g£i(yi),ge2(y2)) ^ g£ive2(v(yi,y2)) 
HQsi{yi),qe2{y2)) > ^eiAeaCyi) with £1,62 e {0,1} 



We can verify that the above VTAM A recognizes t iff the instance of 3-SAT has a solution. 

□ 

VTAMp is closed under union (using the same construction as before) but not under 
complementation. This is a consequence of the following Theorem. 

Theorem 3.10. The universality problem is undecidable for VTAMp. 

Proof. We reduce the blank accepting problem for a deterministic Turing machine M. Like 
in the proof of Theorem 13.61 we encode configurations of A4 as right-combs on a signature 
S containing the tape and state symbols of Ai, considered as binary symbols of SpysH and 
a constant symbol e in SpusH- A sequence of configurations co,ci, . . . ,Cn (with n > 1) is 
encoded as a tree t = f{cn{f{cn-i, ■ ■ ■ /(cq, e)))), where / is a binary symbol of Sint=- Such 
a tree is called a computation of A4 if cq is the initial configuration, c„ is a final configuration 
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e qe{e) f{qB{yi),q{y2)) ^ ■ ^f(yi) 

f{qB{yi),qe{y2)) q{yi) fiqB{yi),qf{y2)) 'Zf(yi) 

f{qB{yi),q{y2)) q{yi) f{qB{yi),qf{y2)) qfiyi) 

Figure 3: The VTAMp A3 in the proof of Theorem [311 

e-^qe{e) /(gv(yi), 9.(^2)) gv(yi) /(g=(yi),Qn(y2)) <?f(yi) 

f{qy{yi),qy{y2)) > gv(yi) f{qy{yi),qf{y2)) - gf(yi) 

/(9n(yi),9v(y2)) gn(yi) 

Figure 4: The VTAMp ^4 in the proof of Theorem [311 

and for all < z < n, Cj+i is the successor of Cj with A4. Moreover, we assume that all the 
Cj have the same length (for this purpose we complete the representations of configurations 
with blank symbols). 

We want to construct a VTAMp A which recognizes exactly the terms which are not 
computations of A4. Hence, A recognizes all the terms of T(S) iff A4 does not accept the 
initial blank configuration. 

For the construction of A, let us first observe that we can associate to a VTAM An 
which, while reading a configuration Cj, will push on the memory its successor Q+i. The 
existence of such an automaton is guaranteed by the first fact that for each regular binary 
relation R, as defined in Section [3.31 there exists a VTAM which, for each (s,t) G R, will 
push t while reading s, and by the second fact that the language of q (8) Q+i, hence the 
relation of successor configuration, are regular. Moreover, since only push operations are 
performed, we can ensure that An satisfies the visibly condition. Let us note qn the final 
state (which is assumed unique wlog) of the VTAM An- We also use the following VTAMs: 
^v- ^ VTAM with (unique) final state gy which, while reading a configuration q will push 

on the memory any configuration with same length as Cj, 
A=: a VTAM with final state q= which, while reading a configuration Cj will push q on the 

memory, 

Ab- a VTAM with final state qb which, while reading a configuration Cj will push on the 
memory a configuration with same length as Cj and containing only blank symbols. 

The VTAMp A is the union of the following automata: 

Ai: a VTAMp recognizing the terms of T(S) which are not representations of sequences 
of configurations (malformed terms). Its language is actually a regular tree language. 

A2: a VTAMp recognizing the sequences of configurations f{cn{f{cn-i, ■ ■ ■ /(co,e)))) such 
that Co is not initial or Cn is not final. Again, this is a regular tree language. 

^3 : a VTAMp recognizing the sequences of configurations with two configurations of differ- 
ent lengths. It contains the transitions rules of Ab and the additional transitions described 
in Figure [3l which perform this test. 

A4: a VTAMp recognizing the sequences of configurations f{cn{f{cn-i, ■ ■ ■ /(co,e)))) such 
that all the Cj have the same length but there exists < i < n such that Cj+i is not the 
successor of Cj hy ^A. This last VTAMp contains the transitions of An, A\/, A=, and the 
additional transitions described in Figure [H 
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With the transition rules in Figure HI the automaton Aa guesses a i < n and, while 
reading each of the configurations cj with j < i, it pushes the successor configuration of Cj, 
say c'j (second column of figure H]). Then, while reading Cj+i A4 pushes Q+i, and it checks 
that and Cj+i differ. After that, when reading each of the remaining configurations, A4 
pushes Ci+i (third column of figured]). 

The VTAMp ^1 to A4 cover all the cases of term T(S) not being an accepting com- 
putation of A4 starting with the initial blank configuration. Hence the language of their 
union A is T(S) iff A4 does not accept the initial blank configuration. □ 

Corollary 3.11. VTAMp is not effectively closed under complementation. 

Proof. It is a consequence of Corollary 13.81 (emptiness decision) and Theorem 13. 101 □ 

3.4.2. Structural Constraints. Lemma 13.71 applies also to another class VTAMp, where = 
denotes structural equality of terms, defined recursively as the smallest equivalence relation 
on ground terms such that: 

• a = b for all a, b of arity 0, 

• /(fill ■52) = g{ti,t2) if si = ti and S2 = t2, for all /, g of arity 2. 

Note that it is a regular relation, and that it satisfies the hypothesis of Lemma 13.71 and the 
condition ii of Theorem 13.21 

Corollary 3.12. The emptiness problem is decidable for YTAM^. 

Following the procedure in the proof of Theorem 13.21 we obtain a 2-EXPTIME com- 
plexity for this problem and this class. 

The crucial property of the relations = and ^ is that, unlike the above class VTAMp 
or the general VTAM^^, they ignore the labels of the contents of the memory. They just 
care of the structure of these memory terms. A benefit of this property of VTAMp is that 
the decision of the membership problem drops to PTIME for this class. 

Theorem 3.13. The membership problem is decidable in PTIME for YTAM"^. 

Proof. Let A = (F, =, Q, Qf, A) be a VTAMp on E and let t be a term in T(S). Let sub{t) 
be the set of subterms of t and let us construct a VTAM A' = (F, sub{t) x Q, {t} x Qf, A') on 
E' where the symbols of S' and S are the same, and we assume that the symbols in category 
INTf (resp. INT^) in the partition of S are in INTi (resp. INT2) in the partition of T,'. 
The transitions of A' are obtained by the following transformation of the transitions of A. 
We only describe the construction for the cases INTi and INTf with positive constraints. 
The other cases are similar. 

• for every /7((?i(?/i), ^2(^2)) Qiui) ^ we add to A' all the transitions: 

f7{{(ii,'ti){yi),{Q2,t2){y2))^ {q,f{h,t2)){yi) such that /(ti,t2) e sub{t), 

• for every fgiqiivi) , q2iy2)) ^^^^'^ > q{yi) € A, we add to A' all the transitions as above (in 
this case, /g is assumed a symbol of category INTi in S') such that moreover struct{ti) = 
struct{t2), where struct[s) is defined, like in the proof of Theorem 13.51 ^ t^is shape 
(unlabeled tree) that will have the memory of A after A processed s. 

The VTAM A! can be computed in time 0(||t|p x ||vl||). It recognizes at most one term, t, 
and it recognizes t iff ^ recognizes t. Therefore, t is recognized by A iff the language of A! 
is not empty. This can be decided in PTIME according to Theorem 12.51 □ 
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Even more interesting, the construction for determinization of Section 12.31 still works 
for VTAMp. 

Theorem 3.14. For every VTAMp A = (r,=,Q,Qf,A) there exists a deterministic 
VTAMp A'^''^ = (r^'=*,=,g'^'^*,gf *, A'^'^*) such that L{A) = L(^'^^*), where |g'^'^*| and 
|r'^'=*| both are 0(2101"). 

Proof. We use the same construction as in the proof of Theorem l2.3l with a direct extension 
of the construction for INT to INT=. The key property for handling constraints is that the 
structure of memory (hence the result of the structural tests) is independent from the non- 
deterministic choices of the automaton. With the visibility condition it only depends on 
the term read. □ 

Theorem 3.15. The class of tree languages o/ VTAMp is closed under Boolean operations. 
One can construct VTAMp for union, intersection and complement of given VTAMp lan- 
guages whose sizes are respectively linear, quadratic and exponential in the size of the initial 
VTAMp. 

Proof. We use the same constructions as in Theorem 12.41 (VTAM) for union and intersec- 
tion. For the intersection, in the case of constrained rules we can safely keep the constraints 
in product rules, thanks to the visibility condition (as the structure of memory only de- 
pends on the term read, see the proof of Theorem I3.14p . For instance, the product of 
the INTf rules /9(gii(yi), 912(^2)) ^'"^^ > qiivi) and /9(g2i(yi), ^22(2/2)) ^'"^" > gi(yi) is 
/9( (911,921) (2/1), (gi2,922)(y2)) ^'"^" > {qi,q2)(.yi)- The product of two INTf is constructed 
similarly. We do not need to consider the product of a rule INTf with a rule INT^ , and 
vice-versa, because in this case the product is empty (no rule is added to the VTAMp for 
intersection). For the complementation, we use Theorem 13.141 and completion. □ 

Corollary 3.16. The universality and inclusion problems are decidable for VTAMp. 

Proof. This is a consequence of Corollary 13.121 and Theorem 13.151 □ 



3.5. Constrained PUSH Transitions. Above, we always considered constraints in tran- 
sitions with INT symbols only. We did not consider a constrained extension of the rules 
PUSH. The main reason is that symbols of a new category PUSH"^, which test two memories 
for structural equality and then push a symbol on the top of them, permit us to construct 
a constrained VTAM A whose memory language M(A, q) is the set of well-balanced binary 
trees. This language is not regular, whereas the base of our emptiness decision procedure 
is the result (Theorem 13. 2| Lemma 13. 7p of regularity of these languages for the classes 
considered. 

3.6. Contexts as Symbols and Signature Translations. Before looking for some ex- 
amples of VTAMp languages, we show a "trick" that (seemingly) adds expressiveness to 
VTAMp. One symbol can perform either a PUSH or a POP operation, or make an INT 
transition (constrained or not), but it cannot combine several of these operations. Here, we 
propose a way to combine several operations in one symbol, and thus increase the expres- 
siveness of VTAMp, without losing the good properties of this class. 
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The trick is to replace symbols by contexts. For instance a context §2(91 {■,■), go) can 
replace a symbol of arity 2. Assume that (72 is a PUSH symbol, gi is an INTi symbol with 
test, and go is an INTq symbol. This context first performs a test on the memories of the 
sons, and then a PUSH operation on the memory kept by gi (and on the _L leaf created by 
go). Such a combination is normally not possible, and replacing symbols by contexts brings 
a lot of additional expressiveness. 

Here is how we precisely proceed: we want to recognize a language (on a signature S) 
with a VTAM, and we have then to choose the categories for each symbol of the signature 
(PUSH, POPjj, INTf , ...). As we will see in the examples below, it might be useful in 
practice to have some extra categories combining the powers of two or more categories of 
VTAMp. We can do that still with VTAMp, by mean of an encoding of the terms of '?"($]). 
More precisely, we replace some symbols of the initial signature S by contexts built with 
new symbols. For instance, we replace a 5 G S, which will perform the complex operation 
described above, by the context 52(5i('; '),go)- Then, we will have to ensure that the new 
symbols (in our example go, gi and (72) are only used to form the contexts encoding the 
symbols of S. This can easily be done with local information maintained in the state 
of the automaton. The set of well formed terms, built with new symbols organized in 
allowed contexts, is a regular tree language. We will call the VTAMp signature obtained a 
translation of the initial signature. If L is a tree language on S, then c(L) is the translation 
of L. 

In summary, we have shown here a general method for adding new categories of symbols 
corresponding to (relevant) combinations of operations of VTAMp, and hence defining 
extensions of VTAMp with the same good properties as VTAMp. By relevant, we mean 
that some combinations are excluded, like for instance, PUSH + constraint = at the same 
time (see paragraph above). Such forbidden combination cannot be handled by our method. 
With similar encodings, we can deal with symbols of arity bigger than 2, e.g. g{-, •, •) can 
be replaced by 52(-, fi'i(-, •))• 

Note however first that this encoding concerns the recognized tree, not the memories. 
For instance, it is not possible to systematically encode the syntactic equality as structural 
equality (on memories) in this way. And indeed, the decision results are drastically different 
in the two cases. 

Also note that, even if c{L) is accepted by a VTAM, which implies that ^c{L) is also 
accepted by a VTAM, it may well be the case that c{-^L) is not recognized by a VTAM. 
So, the above trick does not show that we can extend our results to a wider class of tree 
languages. 

3.7. Some VTAMp Languages. The regular tree languages and VPL are particular cases 
of VTAM languages. We present in this section some other examples of relevant tree 
languages translatable, using the method of Section [3T6l into VTAMp languages. 

Well balanced binary trees. The VTAMp with memory signature {/, _L}, state set {^, ^f}, 
unique final state Qf, and whose rules follow accepts the (non-regular) language of well 
balanced binary trees build with g and a. 

Here a is a constant in Simtq, and 5 is in a new category, and is translated into the 
context g2{gi{-, ■),go), where 52 G SpusH, gi G ^mjf, and go € SinTq- 
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« 9f(-L) 9i{qf{yi),qf{y2)) - q{yi) 

go go(-L) 92{q{yi),qo{y2)) — > qf{f{yi,y2)) 

Powerlists. A powerlist [18j is roughly a list of length 2" (for n > 0) whose elements are 
stored in the leaves of a balanced binary tree. For instance, the elements may be integers 
represented in unary notation with the unary successor symbol s and the constant 0, and 
the balanced binary tree on the top of them can be built with a binary symbol g. This 
data structure has been used in [18] to specify data-parallel algorithms based on divide- 
and-conquer strategy and recursion {e.g. Batcher's merge sort and fast Fourier transform). 

It is easy following the above construction to characterize translations of powerlists 
with a VTAMp. We do not push on the "leaves", i.e. on the elements of the powerlist, and 
compute in the higher part (the complete binary tree) as above. 

Some equational properties of algebraic specifications of powerlists have been studied in 
the context of automatic induction theorem proving and sufficient completeness [17]. Tree 
automata with constraints have been acknowledged as a very powerful formalism in this 
context (see e.g. [9j). We therefore believe that a characterization of powerlists (and their 
complement language) with VTAMp is useful for the automated verification of algorithms 
on this data structure. 

Red-black trees. A red-black tree is a binary search tree following these properties: 

(1) every node is either red or black, 

(2) the root node is black, 

(3) all the leaves are black, 

(4) if a node is red, then both its sons are black, 

(5) every path from the root to a leaf contains the same number of black nodes. 

The four first properties are local and can be checked with standard TA rules. The 
fifth property make the language red-black trees not regular and we need VTAMp rules to 
recognize it. It can be checked by pushing all the black nodes read. We use for this purpose 
a symbol black G SpusH- 

When a red node is read, the number of black nodes in both its sons are checked to be 
equal (by a test = on the corresponding memories) and only one corresponding memory is 
kept. This is done with a symbol red € S|mj=. 

When a black node is read, the equality of number of black nodes in its sons must also 
be tested, and a black must moreover be pushed on the top of the memory kept. It means 
that two operations must be combined. We can do that by defining an appropriate context 
with the method of Section 13. 6i 

In [L5l a special class of tree automata is introduced and used in a procedure for the 
verification of C programs which handle balanced tree data structures, like red-black tree. 
Based on the above example, we think that, following the same approach, VTAMp can also 
be used for similar purposes. 
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BTINTi /i3(gi(yi), 92(2/2)) 

BTINT2 fuiqiiyi), q2{y2)) 

BTINTi /i5(Qi(yi), 92(2/2)) 

BTINT2 /i6(9i(yi), 92(2/2)) 



■> 9(2/1) /i3 e Sbtinti 

9(2/2) /i4 e SBTINT2 

^ 9(2/1) /i5 e Sbtinti 

■> 9(2/2) /16 e SBTINT2 



Figure 5: New transition categories for BTVTAMf: 



4. Visibly Tree Automata with Memory and Structural Constraints and 

bogaert-tlson constraints 

In Section [3l we have only considered VTAM with constraints testing the memories 
contents. In this section, we go a bit further and add to VTAM^^^ some Bogaert-Tison 
constraints [5, i.e. equahty and disequahty tests between brother subterms in the term 
read by the automaton. 

We consider two new categories for the symbols which we call BTINTi and BTINT2, for 
" Bogaert-Tison Internal" . A transition with a symbol in one of these categories will make no 
test on the memory contents, but rather an equality or disequality test between the brother 
subterms directly under the current position of computation. In Figure [5l we describe the 
new transitions categories. We use the same notation as in [4J for the constraints. Note 
that again, we only allow Bogaert-Tison constraints in internal rules. 

For instance, if /i3(ti,t2) is a subterm of the input tree, and if ti leads to 91 (mi), and 
t2 to g2(m2), then the transition rule /13 (91 (2/1), 92(2/2)) ~ > 9(2/1); of type BTINTi can be 
applied at this position iff ti = t2- 

Definition 4.1. A visibly tree automaton with memory and constraints and Bogaert-Tison 
tests (BTVTAM^^^) on a signature E is a tuple (F, R, Q, Qf, A) where F, Q, Qf are defined 
as for TAM, R is an equivalence relation on T(F) and A is a set of rewrite rules in one of 
the above categories: PUSH, POPn, POP12, POP21, POP22, INTq, INTi, INT2, INTf, INTf, 
BTINTi, BTINT2. 

The acceptance of terms of T(S) and languages of term and memories are defined and 
denoted as in Section [2Tl 

The definition of complete BTVTAM^^ is the same as before. Every BTVTAM^^ 
can be completed (with a polynomial overhead) by the addition of a trash state q± (the 
construction is similar to the one for VTAM^^ in Section [3.ip . 

The definition of deterministic BTVTAM^^^ is based on the same conditions as for 
VTAM:^?^ for the function symbols in categories PUSHq, PUSH, POPn, . . . , POP22, INTi, 
INT2, INTf, INTf, and for the function symbols of BTINTi, BTINT2, we use the same kind 
of conditions as for INTf, INTf: for all / G SbtinTi U Sbtint2 ^ot all 91,92 € Q, there are 
at most two rules in A with left-member / (91 (2/1), 92(2/2)) , and if there are two, then their 
constraints have different signs. 

Theorem 4.2. For every BTVTAMp A = (F, =,Q,Qf,A) there exists a deterministic 
BTVTAMp A"^^^ = {T'^^\=,Q'^^\Qf^\A'^^*) such that L{A) = L{A'^^^), where \Q'^^^\ and 
IF-^^*! both are 0(21^1"). 
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Proof. We use, again, the same construction as in the proof of Theorem 12. 3^ with a direct 
extension of the construction for INT to INT^ and BTINT. As mentioned in Theorem 13 .141 
the extension works for INT= because the results of the tests are independent from the 
non-deterministic choices of the automaton. For BTINT it is exactly the same (the brother 
terms are not changed by the automaton!). □ 

Theorem 4.3. The class of tree languages of BTVTAMp is closed under Boolean opera- 
tions. 

Proof. We use the same constructions as in Theorem 12.41 for union and intersection. For the 
intersection, as in Theorem 13.151 the constraints (even Bogaert-Tison tests) can be safely 
kept in product rules, thanks to the visibility condition. For the complementation, we use 
Theorem 14.21 and complementation. □ 

The proof of the following theorem follows the same idea as the proof for Bogaert-Tison 
automata [1], but we need here to take care of the structural constraints on the memory 
contents. A consequence is that the complexity of emptiness decision is much higher. 

Theorem 4.4. The emptiness problem is decidable for BTVTAMp. 

Proof. Let ^ be a BTVTAMp. First we determinize it into A'^'^^ and assume that A'^'^* is 

also complete. Then, we delete the rules BTINTi of the form: f{qi{yi),q2{y2)) > (livi)- 
with qi distinct from q2 (idem for BTINT2 rules) because they can't be used (the automaton 
is deterministic so one term cannot lead to two different states). 

For the same reason, we change each rule BTINTi of the form: /(gi(yi), 92(2/2)) ^^"^ > 
q{yi) with qi distinct from q2 (idem for BTINT^^ rules) into the same rule but without the 
disequality test: /(gi(2/i), ^2(2/2)) q{yi)- 

We call the newly obtained automaton A^^'^ . It is still deterministic and recognizes 
the same language as A'^^^ . Actually, the careful reader may notice that A^^'^ is not a true 
BTVTAMp, because some unconstrained rules may involve symbols in BTINT in this au- 
tomaton. However, it is just an intermediate step in the construction of another automaton 
A' below. 

Now, we consider the remaining BTINTi or BTINT2 rules with negative Bogaert-Tison 
constraints, which are of the form: f{qi{yi),qi{y2)) > q{yi) (or q{y2))- We denote them 
by Ri, Ri, Rn, and denote by q^ the state in the left member of Ri, for each i < N. 
We also denote the corresponding BTINTi or BTINT2 rules by 5i,...., Si,..., Sn- Note that, 
since is deterministic and complete, we can associate to each rule of BTINTj, whose 
constraint is negative, a unique rule of BTINTj with a positive constraint and the same 
states in its left member. So, the state in the left member of St is the same qi as for Ri. 

It is important to notice that if a rule Ri can effectively be used, then there must exist 
two distinct terms leading to the state qi (we will call them witnesses). If not, the rule can 
be removed. 

So, our purpose is now to find, for each rule Ri, whether two witnesses exist or not. 
We let TZ be initially {Ri, ■ ■ ■ ,Rn}- Suppose that at least one Ri rule can be used, and 
consider a run on a term t that uses such a rule. We consider an innermost application of 
a rule Ri in this run on a subterm f{ti,t2). The run on ti and the run on t2 both lead to 
the state qi, without any use of an Rj rule. 

Let us remove all the Ri rules from A^'^'^ , and we remove all the equality tests in the 
Si rules. Let A! be the resulting automaton. It is a deterministic VTAMp (considering the 
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symbols in BTINT as INT symbols in this new automaton), and each term in L{A',qi) can 
be transformed (we will call it BT-transformation) into a term in L{A^^^ ,qi): each time 
we use a modified Si rule, for instance of type BTINTi, on a subtree f{ti,t2), we replace 
t2 with ti so that the equality test is satisfied (and the resulting memory is unchanged). 
Important: all the replacements must be performed bottom-up. 

The proof of the emptiness decidability of VTAMp (Corollary I3.12p is constructive, 
hence if we choose a reachable state qj, we can find a term in L{A',qj) to this state, and 
then convert it into a witness. So, we can find a first witness € L(A'^'^^ , qj). 

If no witness can be found, then all the Ri rules are useless and we can definitely 
remove them all. Otherwise, we still need to find another witness, and if there is at least 
one such other witness, then one of them can be recognized without using a Ri rule. We 
can construct a VTAMp recognizing all the terms whose BT-transformation leads to t^- 
To design it, we read tA top-down (knowing the state of A' at each node), and each time 
we see a subterm /(ti,t2) to which a modified Si rule has to be applied, for instance a 
modified BTINTi (resp. BTINT2) rule, the right (resp. left) son of / only needs to be a 
term in L{A',qi), and the left (resp. right) son of / only needs to be BT-transformed into 
ti (resp. t2)- Once this VTAMp is constructed, we can combine it with A' in order to 
obtain a VTAMp recognizing all the terms leading A' to qj (the state reached by A' on t^) 
except the terms whose BT-transformation is tA- Then we find another term in L(A',qj) 
(if it exists) and its BT-transformation is not tA- it is actually another witness ts- 

When we have two witnesses for a rule Rj, we remove it from TZ, and we add this rule 
Rj to A', but without the disequality test. The automaton A' keeps its good property: a 
term t leading A' to some state q can be BT-transformed into a term leading A"''''^ to state 
q: when we "meet" the use of a rule formerly in the set TZ on /(ti, ti) during the bottom-up 
exploration of t, we replace the right (for a rule that was of type BTINTi and with negative 
constraints) or the left son (otherwise) by a witness different from ti, so that the disequality 
test is satisfied. Note that even if ti is a witness, we can do so because we have found two 
witnesses. 

With the new rule in A' we look for 2 witnesses for some remaining Ri rule. Again, we 
can show that if a couple of witnesses exists, then at least one couple can be found without 
any use of the remaining Ri rules. When we find a first witness tA for a remaining rule Rj, 
we can find another one (if it exists) using approximately the same technique as previously: 
we read tA top-down, and when we see a rule formerly in TZ, used on /(ti,t2) (e.g. a rule 
formerly of type BTINTi with a negative constraint), we just go on recursively, saying that 
the left son must be a term whose BT-transformation is ti, and the right son must be either: 

• a term whose BT-transformation is t2, 

• or, if our BT-transformation would change f{ti,ti) into f{ti,t2), a term whose BT- 
transformation is ti. 

As previously, we construct a VTAMp, fully using the Boolean closure of this class, that 
recognizes the terms in L{A',qj) (the state reached by A' on t^), except those whose BT- 
transformation is tA, and therefore we can find another witness (if it exists) ts- 

We continue to use this method, finding couples of witnesses, until there is no rule in 
the set TZ anymore, or until we are not able to find a new couple of witnesses anymore: in 
that latter case, we remove the remaining Ri rules because they are useless. 

So, now we use the final version of A' obtained in order to find a term leading to a final 
state, and since we have a couple of witnesses for each rule formerly in the set TZ, we can 
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BT-transform it into a term accepted by A^^^ (hence by A). If such a term does not exist, 
the language recognized by A'^^^ (i.e. the language recognized by A) is empty. □ 

5. Conclusion 

Having a tree memory structure instead of a stack is sometimes more relevant (even 
when the input functions symbols are only of arities 1 and 0). We have shown how to extend 
the visibly pushdown languages to such memory structures, keeping determinization and 
closure properties of VPL. Our second contribution is then to extend this automaton model, 
constraining the transition rules with some regular conditions on memory contents. The 
structural equality and disequality tests appear to a be a good class of constraints since 
we have then both decidability of emptiness and Boolean closure properties. Moreover, 
they can be combined (while keeping decidability and closure results) with equality and 
disequality tests a la [3], operating on brothers subterms of the term read. 

Several further studies can be done on the automata of this paper. For instance, the 
problem of the closure of the corresponding tree languages under certain classes of term 
rewriting systems is particularly interesting, as it can be applied to the verification of 
infinite state systems with regular model checking techniques. It could be interesting as well 
to study how the definition of VTAM can be extended to deal with unranked trees, with the 
perspective of applications to problems related to semi-structured documents processing. 
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Appendix: Two-way tree automata with structural equality constraints 
are as expressive as standard tree automata. 

In this section, we complete the proof of Lemma 13.71 We show actually a more general 
result: we consider two-way alternating tree automata with some regular constraints and 
show that the language they recognize is also accepted by a standard tree automaton. This 
generalizes the proof for two-way alternating tree automata (see e.g. [8] chapter 7) and the 
proof for two-way automata with equality tests [7j, which itself relies on a transformation 
from two-way automata to one-way automata [6]. 

Two-way automata are, as usual, automata that can move up and down and alter- 
nation consists (as usual) in spawning to copies of the tree in different states, requiring 
acceptance of both copies. In the logical formalism, alternation simply corresponds to 
clauses qi{x),q2{x) qix), requiring to accept x both in state qi and in state q2 if one 
wants to accept x in state q. 

For simplicity, we assume that all function symbols have arity or 2. Lexical conven- 
tions: 

• f,g,h,... are ranging over symbols of arity 2. Unless explicitly stated they may denote 
identical symbols. 

• a,b,c... range over constants 

• x, xi, . . . , Xi, . . . ,y, . . . ,yi, z, . . . , Zi, . . . are (universally quantified) first-order variables, 

• S, Si, S2, ■ ■ ■ , Si, . . . range over states symbols for a fixed given tree automaton 

• Q,Qi,Q2, ■ ■ ■ , range over states symbols of the tree automaton with memory 

• R, Ri, R2, ■ ■ ■ , range over state symbols of the binary recognizable relations. 

We assume that Ri are recognizable relations defined by clauses of the form: 



We assume wlog that there is a state St in which all trees are accepted (a "trash state"). 
Moreover, we will need in what follows an additional property of the i2j's: 



This property is satisfied by the structural equivalence, for which there is only one index 
i: Ri == and we have indeed 



It is also satisfied by the universal binary relation and by the equality relation. That is why 
this generalizes corresponding results of [HIT]- 

Our automata are defined by a finite set of clauses of the form: 



(A) 





R{a,f{x,y)) 

R3{f{xi,yi),g{x2,y2)) 

S{fix,y)) 




yi,j,3k,l, Ri{x,y) ARj{y,z) ^ Rk{x,y) ARi{x,z) 



x = yf\y = z\=\x = y/\x = z 
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(1) Qi{yi) , Q2{y2) ,R{yi,y2) Qsiyi) 

(2) Qi{yi),Q2{y2) Qz{f{yi,y2)) 
(26) ^ Qi(a) 

(3) QiU{yi,y2)) Myz) Qsiyi) 

(4) Qi{f{yi,y2)) ,Q2{y3) => Q3{y2) 

These clauses have a least Herbrand model. We write {Q} the interpretation of Q in 
this model. This is the language recognized by the automaton in state Q. 

The goal is to prove that, for every Q, {Qj is recognized by a finite tree automaton We 
use a selection strategy, with splitting and complete the rules (l)-(4) above. We show that 
the completion terminates and that we get out of it a tree automaton which accepts exactly 
the memory contents. Splitting will introduce nullary predicate symbols (prepositional 
variables). 

We consider the following selection strategy. Let Ei be the set of literals which contain 
at least one function symbol and E2 be the set of negative literals 

(1) If the clause contains a negative literal -^R{u, v) or a negative literal -^S{u) where either 
u, V is not a variable, then select such literals only. This case is ruled out in what follows 

(2) If the clause contains at least one negated prepositional variable, select the negated 
propositional variables only. This case is ruled out in what follows 

(3) If EinE2^ 0, then select Ei n E2 

(4) liEiy^H) and £^1 n ^2 = 0, then select Ei 

(5) If ii^i = and E2 ^ 0, then select the negative literals -iR{x,y) and -'S{x) if any, 
otherwise select E2 

(6) Otherwise, select the only literal of the clause 

In what follows (and precedes), selected literals are underlined. 

We introduce the procedure by starting to run the completion with the selection strat- 
egy, before showing the general form of the clauses wc get. 

First, clauses of the form (3), (4) arc replaced (using splitting) with clauses of the form 

(3) Qi(/(yi,y2)),NEQ, QM 

(4) Qi(/(yi,y2)),NEQ, ^ Q3(y2) 
(si) Q2{x) ^ NEq, 

Overlapping (si) and (2, 2b) may yield clauses of the form 

(,S2) NEq^,NEq, ^ NEq3 

(.S3) ^ NEq 

together with new clauses of the form (si). Eventually, we may reach, using (53) and (3-4) 
clauses: 

(36) Qi{f{yi,y2)) => Qsiyi) 
(46) Qi{fiyuy2)) => ^3(2/2) 
(1) + (2) yields clauses of the form 
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(5.1) Qijyi), Q2{y2), Qajgjys, ^4)) , -Ri(yi, ys), -^2(^/2, va) ^ Q4(/(yi,y2)) 

(5.2) Qi{yi),Q2{y2), Q3{a) ,Siiyi),S2{y2) ^ Q4(/(yi,y2)) 

(5.3) Qija) ^ Q2ib) 

(5.4) Si{yi),S2{y2), Qi{f{yi,y2)) => Q2(a) 
(2) +(3b) and (2) + (4b) yield clauses of the form (after splitting): 

(6) NEQ3,Qi(yi) ^ Q2(yi) 

and eventually 

(66) Qi(yi) ^ g2(yi) 

(5.1) + (2) yields 

(7.1) Qi{yi),Q2{y2),Q3{y3),Q4{y4),Ri{yi,y3),R2{y2,y4) ^ Qb{f{yi,y2)) 

We split (7.1) : we introduce new predicate symbols ^ defined by 

Qi{y),Rj{x,y) Qf'{x) 

Then clauses (7.1) becomes: 

(7.1) Qi(yi),Q2(y2),Qf (yi),gf (2/2) Q^{f{yi,y2)) 

(5.2) + (2b) yields clauses of the form 

(7.2) Qi(yi),g2(y2),5i(yi),52(y2) ^ Q3{f{yi,y2)) 

(6b) + (2) yields new clauses of the form (2). (7.1) + (5.1) yields clauses of the form: 

(8.1) Qi(yi), Q2(y2), (yi), (^2), Q5(y3), ^6(2/4), ^1(^3, yi), i?2(y4, ^2) 
^ Q7(/(y3,y4)) 

At this point, we use the property of R and split the clause: 

3yi.Qi(yi) AQf3(yi) Ai2i(y3,yi) H Qf'Cyi) AQf^(yi) 
Hence clauses (9.1) can be rewritten into clauses of the form: 

(8.1) Qf (yi),gf (yi),Q5(yi),Q^^(y2),Qf (y2),g6(y2) ^ Q7(/(yi,y2)) 

Finally, if we let Q be the set of predicate symbols consisting of 

• Symbols Si 

• Symbols Qi 

• Symbols Qf^ 

For every subset S of Q, we introduce a propositional variable NE5. Clauses arc split, 
introducing new propositional variables (or predicate symbols ^ ) in such a way that in 
all clauses except split clauses, the variables occurring on the left, also occur on the right 
of the clause. And, in split clauses, there is only one variable occurring on the left and not 
on the right. 

We let C be the set of clauses obtained by repeated applications of resolution with 
splitting, with the above selection strategy (a priori C could be infinite) . We claim that all 



32 



H. COMON-LUNDH, F. JACQUEMARD, AND N. PERRIN 



generated clauses are of one of the following forms (Where the P^'s and the P/'s belong to 
Q, Q's states might actually he Q^^) 

1. Pop clauses, (the original clauses, which arc not subsumed by the new clauses): 

(3) Qi(/(yi,y2)),NEQ, ^ Q^iyi) 

(4) Qi(/(yi,y2)),NEQ, ^ QM 
(36) Qi{f{yi,y2)) ^ Q2{yi) 
(46) Qi(/(jyi:jy2)) Q2{y2) 

Note that, clause (1) is a particular case of the alternating clauses below, since it can 
be written 

Qiiyi),Q2iyi)^Q3iyi) 

2. Push clauses. 

(Pi) Pi{x),...,Pn{x),Pi{y),...,P;,{y) Q{f{x,y)) 

(P2) ^ P{a) 



(P3) NE5,Pi(x),...,P„(x),P^(2/),...,P^(y) ^ Q(/(x,y)) 
(P4) NE5 ^ Q(a) 

3. Intermediate clauses. 

(11) Pi(x), . . . , Pn(x), P{ (y), . . . , P4(y), Pf (/(x, j/)) , . . . , P^^(/(ar, y)) ^ Q(/(x,y)) 

(12) Pi(a),..., Pn(a) ^ Q(a) 

(13) Si{xi),S2{x2), Qi{a) =^ Q2{g{xi,X2)) 

(14) Qi(a) ^ ^2(6) 

4. Alternating clauses. 

(Ai) NE5,Pi(.x),...,P„(,x) ^ Q(x) 
(A2) Pi(x) ,..., P„(.x) ^ g(x) 

In addition, we have clauses obtained by splitting: 

5. Split clauses. 

(51) Rj{x,y) ,Qi{y) ^ Qf^{x) 
(S16) Rj{y,x) ,Qi{y) ^ Q'^^'ix) 

(52) Ri{xi,yi),R2{x2, T/2), Qi{f{yi,y2)) Qi '{g{xi,x2)) 

(53) S^{x),S2{y) Mf{x,y)) Qf^^(a) 

(54) PiM' • • • ^ NE{p,,...,p„} 

(55) Pi(x),...,P„(x),P((y),...,P4(y),Pf(/(x,y)),...,P;(/(x,y)) ^ NE5 
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6. Prepositional clauses. 

(El) NE5^,...,NE5^ ^ NE5 

(E2) ^ NE5 

(E3) Piia),...Pn{a) NE5 

Every resolution step using the selection strategy of two of the above clauses yield a 
clause in the above set 

POP+PUSH: yields an alternating clause (Ai) and a split clause (S4). 

INT + PUSH: yields a Push clause or an intermediate clause 

alternating + PUSH: yields an intermediate clause (li) or (I2). 

split + R: yields a split clause (8)2 or (S3) or an intermediate clause (I3) or (I4). 

(S2) + PUSH: yields clauses (Si) and push clauses. Note that here, we use the property of 

R ■ 

the relation R to split clauses, which may involve predicates ^ . 
($3)+ PUSH: yields push clause and split clauses (S4). 

(54) + PUSH: yields split clauses (S5) or propositional clause (E3). 

(55) + PUSH: yields split clauses (S5) or propositional clause (Ei). 

It follows that all clauses of C are of the above form. Since there are only finitely many 
such clauses, C is finite and computed in finite (exponential) time. 

Now, we let A be the alternating tree automaton defined by clauses (Pi) and (P2) 
(and automata clauses defining the S states). Let, for any state Q, IQ}a be the language 
accepted in state Q by A. We claim that |Q] = |^]. 

To prove this, we first show (the proof is omitted here) that NE^p^ p^y is in C iff 

{PiJa n . . . n {PnU + 0- 

Then observe that [Q] is also the interpretation of Q in the least Her brand model of C: 

indeed, all computations yielding C are correct. Since [Q]^ C |Q] is trivial, we only have 
to prove the converse inclusion. For every t G there is a proof of Q{t) using the clauses 
in C. 

Assume, by contradiction, that there is a term t and a predicate symbol Q such that 
all proofs of Q{t) using the clauses in C involve at least a clause, which is not an automaton 
clause. Then, considering an appropriate sub-proof, there is a term u and a predicate 
symbol P such that all proofs of -P(n) involve at least one non-automaton clause and there 
is a proof of P(n) which uses exactly one non-automaton clause, at the last step of the 
proof. 

We investigate all possible cases for the last clause used in the proof of Piu) and derive 
a contradiction in each case. 

Clause li: The last step of the proof is 

Pl(ni), . . . , P„(t.i), P[(U2), . . . , P'^{u2),P'l{j{u^, U2)), . . . , P;!{f{ui,U2)) 

P{f{ui,U2)) 

and we assume u = f{ui,U2)- Assume also that, among the proofs we consider, k is 
minimal. (If A; = then we have a push clause, which is supposed not to be the case). 

By hypothesis, for all i, m G {PijA, U2 € {PUa and f{ui,U2) G [-P/'l^- In particular, 
if we consider the last clause used in the proof of P'^(u): 

Qi{x), Qr{x),Q[{y), . . . , Q'M Pl!{f{x, y)) 
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belongs to C. Then, overlapping this clause with the above clause li, the following clause 
belongs also to C: 

. . . ,P„(x),Qi(x), . . .,Qr{x), 

Piiy),...,p:^iy),Q[iy),...,Q'M,Piifi^,y)),---,PLiifi^^y))^Pif{^,y)) 

and therefore wc have another proof of P{u): 

Pl(-Ul), . . . ,Pniui),Qi{ui), . . .,Qr{ui) 

Pi{u2), PM, Q'lM, Q's{u2),Pi'{f{ui,U2)), Pi:_,{f{ui,U2)) 

Pifiui,U2)) 

which contradicts the minimality of k. 
Clause (Ai): The last step of the proof is 

Pl{u),...,Pn{u) 



P{n) 

By hypothesis, the proofs of Pi{u) only use automata clauses: Wi-u G [-Pi]^. Le the push 
rule 

Qi(x), . . . , Qmix), Q[iy), Q'piy) Pn{f{x, y)) 
be the last clause used in the proof of P{u). Overlapping this clause and the clause Ai 
above, there is another clause in C yielding a proof of P{u): 

Qi(x), . . . , Qm{x), Q[{y), . . . , Pi(/(x, y)), . . . , P„-i(/(x, y)) P{f{x, y)) 

And wc are back to the case of li. 
Clause (3b): 

Qi{f{n,t)) 



P{u) 

By hypothesis f{t,u) € [Qi]^. Hence there is a push clause 

Pi(x), . . . ,P„(x), Pi'(y), . . . , pM ^ Qiifix,y)) 

such that t G |Pi]^ n . . . fl [Pn]^ and u e [P^ n . . . Pi [P^^l^. By resolution on the clause 
(3b), there is also in C a clause 

Pl{x), . . . , Pnix), NE{p/,...,p^} ^ Q{x) 

However, since IP[}a H . . . n [P^lyl / ^, ^^{P[,...,p'^} is also in C and, by resolution again 

Pl(x),...,Pn(x) ^Q(X) 

is a clause of C. 

Then we are back to the case of Ai. 
Clause (3): The last step of the proof is 

Q,{fiu,t)) NEq, 



Piu) 

Since NEq2 € C in this case, by saturation of C, there is a clause Qi{x,y) =^ Q{x) in C, 
and we are back to the case of (36). 
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Other cases: they are quite similar to the previous ones. Let us only consider the case of 
clause (S2), which is slightly more complicated. 

Rl{ui,Vi) R2{U2,V2) Qi{f{vi,V2)) 



Qf'{g{ui,u2)) 

Assume moreover that u = g{ui,U2) is a minimal size term such that, for some Qi,Rj, 

^ (u) is provable using as a last step an inference S2, and is not provable by automata 
clauses only, 

As before, we consider the overlap between S2 and a push clause. We get 

Ri{xi,yi),R2{x2,y2),Pi{yi),---,Pn{yi),Pi{y2),---,PLiy2)^Qf'{g{xi,x2)) 

Hence, the following clauses belong to C (when Pj, P/ are not themselves predicates Q^; 
otherwise, we have to use the property on R relations and split in another way, using the 
predicate, as shown later): 

Riixi,yi) ,Piiyi) ^ P^^'ixi) 
R2ix2,y2) ,Pl{y2) => PI''''{X2) 

and we have the following proof of g{ui,U2): 

Rl{ui,Vl) Pl{vi) Rl{ui,Vn) Pn{vi) -R2 (^2 , tUl ) P2 (w^l ) P2(«2, f«m) -P^l^m) 

Pi^'jui) ••• P^'{u^) P[''\u2) ■■■ PL''\^2) 

Qf'{giUl,U2)) 

Now, by overlapping again Ri{xi,yi) and ^2(^^25 2/2) with their defining clause, we 
compute "shortcut clauses" belonging to C and get another proof (for instance assuming 
vi = /(f 11,^12) and ui = /i(uii, U12)): 

-Rll('"ll,^'ll) Rl2{ui2,Vi2) P\{f{vii,Vi2)) R2{u2,Wi) P2{wi) R2(u2,Wm) Pm{Wm) 



Pl^'M ••• P{^^(n2) ••• J4^^(n2) 

Qf'ig{ui,u2)) 

By minimality of u, m G {P^^a- Similarly, for every i, m G {P.^^a- ^2 e {PI^^a 
and it follows that g{ui,U2) G [Qj ^}a- 

Finally, let us consider the case where some Pj is itself a predicate symbol Q^, in 
which case we do not have a predicate (Q^)^'^ . We use then the assumed property of the 
predicates Rii Ri{x, y) A R{y, z)\^R'i{x, y) A R'{x, z), hence 

(3u, ^v.Riix.u) AR{u,v) A Q{v))^{3u.Ri{x,u) ASr{u)) A {3v.R{x,v) A Q{v)) 

Hence we need two split clauses instead of one: 

R[{x,y) ^ S^'^ix) 
R'{x,y),Q{y) Q«'(x) 
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And Ri{xi,yi),Q^{yi) is replaced with S^^{xi),Q^' (xi). Note that such a transforma- 
tion is not necessary when there is a single transitive binary relation, as in our application: 
then R{x,y) A Q^{y) is simply replaced with Q^{x). 

To sum up: if there is a proof of P{u) using clauses of C, then, by saturation of the 
clauses of C w.r.t. overlaps with push clauses, we can rewrite the proof into a proof using 
push clauses only: u G {PJa- This proves that |P| = {PJa- 

Finally, it is easy (and well-known) to compute a standard bottom-up automaton ac- 
cepting the same language as an alternating automaton; this only requires a subset con- 
struction. That is why the language accepted by our two-way automata with structural 
equality constraints is actually a recognizable language. The overall size of the resulting 
automaton (and its computation time) arc simply exponential, but we know that, already 
for alternating automata, we cannot do better. 
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